Search CORE

28 research outputs found

A multilevel layout algorithm for visualizing physical and genetic interaction networks, with emphasis on their modular organization

Author: Aittokallio T
Nevalainen OS
Salmela P
Tuikkala J
Vähämaa H
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2022
Field of study

Missing value imputation improves clustering and interpretation of gene expression microarray data

Author: AG de Brevern
D Wang
G Feten
H Kim
H Kuhn
H Yoshimoto
I Scheel
J Handl
J He
J Hu
J Tuikkala
JJ Wyrick
JL DeRisi
Johannes Tuikkala
Laura L Elo
M Al-Daoud
M Hirao
M Kankainen
M Ronen
M Shapira
MJ Brauer
O Troyanskaya
Olli S Nevalainen
P D'haeseleer
PT Spellman
R Jörnsten
S Oba
S Tavazoie
T Lange
Tero Aittokallio
TR Golub
X Gan
X Wang
Y Shi
Z Cai
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. Results We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. Conclusion The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Metrics for GO based protein semantic similarity: a systematic evaluation

Author: A Schlicker
A Valencia
André O Falcão
António EN Ferreira
C Pesquita
C Wu
Catia Pesquita
D Devos
D Devos
D Faria
D Lin
Daniel Faria
E Camon
EB Camon
F Azuaje
F Azuaje
F Couto
F Couto
FM Couto
Francisco M Couto
Gentleman
Hugo Bastos
J Chabalier
J Jiang
J Tuikkala
JL Sevilla
L Stein
P Lord
P Lord
P Resnik
PH Lee
RM Othman
RM Riensche
S Cao
T Joshi
X Guo
X Wu
Y Tao
Z Lei
ZH Duan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. Results We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. Conclusions This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Universidade de Lisboa: Repositório.UL

Missing value imputation for microRNA expression data by using a GO-based similarity measure

Author: A Krek
AJ Enright
B John
BP Lewis
D Cimino
D Wang
Dandan Song
F Biagioni
G Schuler
G Yu
G Yu
H Kim
H Wu
J Lu
J Peng
J Tuikkala
JL Sevilla
JZ Wang
M Ashburner
M Kanehisa
N Qing-shan
O Troyanskaya
P Resnik
P Sethupathy
P Zhang
PW Lord
Q Xiang
R Edgar
S Griffiths-Jones
S Volinia
X Zhou
Yang Yang
Zhuangdi Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A literature-based similarity metric for biological processes

Author: A Hyvarinen
A Tanay
AA Petti
AB Maxfield
AG Fraser
AH Tong
Alberto Pascual-Montano
CD Powell
Concha Gil
D Chaussabel
D Lin
D Martin
DD Lee
DE Levin
DM Blei
E Ravasz
EA Adie
G Weeks
H Shatkay
HS Carr
J Tuikkala
Jose M Carazo
L Giot
LH Hartwell
M Ashburner
M Chagoyen
M Vidal
MF Porter
Monica Chagoyen
NJ Krogan
O Bodenreider
P Glenisson
P Khatri
P Pehkonen
P Resnik
P Resnik
Pedro Carmona-Saez
PV Ogren
PW Lord
PW Lord
R Homayouni
RB Cattell
S Deerwester
S Deerwester
S Myhre
T Hofmann
T Sekito
T Yu
U Alon
VL Boyartchuk
X Wu
Z Bar-Joseph
ZN Oltvai
Publication venue: BioMed Central
Publication date: 01/07/2006
Field of study

BACKGROUND: Recent analyses in systems biology pursue the discovery of functional modules within the cell. Recognition of such modules requires the integrative analysis of genome-wide experimental data together with available functional schemes. In this line, methods to bridge the gap between the abstract definitions of cellular processes in current schemes and the interlinked nature of biological networks are required. RESULTS: This work explores the use of the scientific literature to establish potential relationships among cellular processes. To this end we haveused a document based similarity method to compute pair-wise similarities of the biological processes described in the Gene Ontology (GO). The method has been applied to the biological processes annotated for the Saccharomyces cerevisiae genome. We compared our results with similarities obtained with two ontology-based metrics, as well as with gene product annotation relationships. We show that the literature-based metric conserves most direct ontological relationships, while reveals biologically sounded similarities that are not obtained using ontology-based metrics and/or genome annotation. CONCLUSION: The scientific literature is a valuable source of information from which to compute similarities among biological processes. The associations discovered by literature analysis are a valuable complement to those encoded in existing functional schemes, and those that arise by genome annotation. These similarities can be used to conveniently map the interlinked structure of cellular processes in a particular organism

Crossref

Directory of Open Access Journals

PubMed Central

Digital.CSIC

Missing value imputation for microarray gene expression data using histone acetylation information

Author: AA Alizadeh
AL Clayton
AP Gasch
C Rich
Caisheng He
CM Perou
D Schubeler
DE Koryakov
DJ Duggan
DK Pokholok
E Segal
GC Yuan
GCLY Yuan
H Kim
H Yoshimoto
HY Yu
I Takemasa
J Tuikkala
JA Orr
Jiang Wang
Jihua Feng
JJ Hu
JL DeRisi
JL Schafer
KJ Kim
KW McCool
L Mariño-Ramírez
L Narlikar
L Verdone
M Ouyang
MB Eisen
MD Meneghini
MPS Brown
MS Kobor
MSB Sehgal
O Alter
O Alter
O Troyanskaya
OJ Rando
P Johansson
P Spellman
Qian Xiang
RJA Little
S Chatterjee
S Oba
S Raychaudhuri
SA Armstrong
SC Kim
SK Kurdistani
TR Golub
TR O'Connor
TY Roh
X Feng
X Guo
Xianhua Dai
Yangyang Deng
Zhiming Dai
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method) is presented. It incorporates the histone acetylation information into the conventional KNN(<it>k</it>-nearest neighbor) and LLS(local least square) imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE). Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central